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Abstract — We propose a new method for reconstruction of 
sparse signals with and without noisy perturbations, termed the 
subspace pursuit algorithm. The algorithm has two important 
characteristics: low computational complexity, comparable to 
that of orthogonal matching pursuit techniques when applied 
to very sparse signals, and reconstruction accuracy of the same 
order as that of LP optimization methods. The presented analysis 
shows that in the noiseless setting, the proposed algorithm can 
exactly reconstruct arbitrary sparse signals provided that the 
sensing matrix satisfies the restricted isometry property with a 
constant parameter. In the noisy setting and in the case that 
the signal is not exactly sparse, it can be shown that the mean 
squared error of the reconstruction is upper bounded by constant 
multiples of the measurement and signal perturbation energies. 

Index Terms — Compressive sensing, orthogonal matching pur- 
suit, reconstruction algorithms, restricted isometry property, 
sparse signal reconstruction. 

I. Introduction 

Compressive sensing (CS) is a sampling method closely 
connected to transform coding which has been widely used 
in modern communication systems involving large scale data 
samples. A transform code converts input signals, embedded 
in a high dimensional space, into signals that lie in a space 
of significantly smaller dimensions. Examples of transform 
coders include the well known wavelet transforms and the 
ubiquitous Fourier transform. 

Compressive sensing techniques perform transform cod- 
ing successfully whenever applied to so-called compressible 
and/or if -sparse signals, i.e., signals that can be represented by 
K <C N significant coefficients over an TV-dimensional basis. 
Encoding of a if-sparse, discrete-time signal x of dimension 
N is accomplished by computing a measurement vector y that 
consists of m -C N linear projections of the vector x. This 
can be compactly described via 

y = *x. 

Here, <& represents an m x N matrix, usually over the field 
of real numbers. Within this framework, the projection basis 
is assumed to be incoherent with the basis in which the signal 
has a sparse representation jT). 

Although the reconstruction of the signal x 6 M. N from the 
possibly noisy random projections is an ill-posed problem, the 
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strong prior knowledge of signal sparsity allows for recovering 
x using m <C N projections only. One of the outstanding 
results in CS theory is that the signal x can be reconstructed 
using optimization strategies aimed at finding the sparsest 
signal that matches with the m projections. In other words, 
the reconstruction problem can be cast as an lo minimization 
problem (2). It can be shown that to reconstruct a If -sparse 
signal x, lo minimization requires only m — 2K random pro- 
jections when the signal and the measurements are noise-free. 
Unfortunately, the Iq optimization problem is NP-hard. This 
issue has led to a large body of work in CS theory and practice 
centered around the design of measurement and reconstruction 
algorithms with tractable reconstruction complexity. 

The work by Donoho and Candes et. al. [Q, 0, 0, 
demonstrated that CS reconstruction is, indeed, a polynomial 
time problem - albeit under the constraint that more than 
2K measurements are used. The key observation behind these 
findings is that it is not necessary to resort to Iq optimization 
to recover x from the under-determined inverse problem; a 
much easier l\ optimization, based on Linear Programming 
(LP) techniques, yields an equivalent solution, as long as the 
sampling matrix 3? satisfies the so called restricted isometry 
property (RIP) with a constant parameter. 

While LP techniques play an important role in designing 
computationally tractable CS decoders, their complexity is 
still highly impractical for many applications. In such cases, 
the need for faster decoding algorithms - preferably operating 
in linear time - is of critical importance, even if one has 
to increase the number of measurements. Several classes of 
low-complexity reconstruction techniques were recently put 
forward as alternatives to linear programming (LP) based 
recovery, which include group testing methods (6), and al- 
gorithms based on belief propagation J7]. 

Recently, a family of iterative greedy algorithms received 
significant attention due to their low complexity and simple 
geometric interpretation. They include the Orthogonal Match- 
ing Pursuit (OMP), the Regularized OMP (ROMP) and the 
Stagewise OMP (StOMP) algorithms. The basic idea behind 
these methods is to find the support of the unknown signal 
sequentially. At each iteration of the algorithms, one or several 
coordinates of the vector x are selected for testing based 
on the correlation values between the columns of and 
the regularized measurement vector. If deemed sufficiently 
reliable, the candidate column indices are subsequently added 
to the current estimate of the support set of x. The pursuit 
algorithms iterate this procedure until all the coordinates in 
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the correct support set are included in the estimated support 
set. The computational complexity of OMP strategies depends 
on the number of iterations needed for exact reconstruction: 
standard OMP always runs through K iterations, and there- 
fore its reconstruction complexity is roughly O (KraN) (see 
Section IIV-CI for details). This complexity is significantly 
smaller than that of LP methods, especially when the signal 
sparsity level K is small. However, the pursuit algorithms do 
not have provable reconstruction quality at the level of LP 
methods. For OMP techniques to operate successfully, one 
requires that the correlation between all pairs of columns 
of $ is at most 1/2K [|8|, which by the Gershgorin Circle 
Theorem [9| represents a more restrictive constraint than 
the RIP. The ROMP algorithm 01j) can reconstruct all K- 
sparse signals provided that the RIP holds with parameter 
^2K < 0.06/y/\ogK, which strengthens the RIP requirements 
for Zi-linear programming by a factor of ^/log K . 

The main contribution of this paper is a new algorithm, 
termed the subspace pursuit (SP) algorithm. It has provable 
reconstruction capability comparable to that of LP methods, 
and exhibits the low reconstruction complexity of matching 
pursuit techniques for very sparse signals. The algorithm can 
operate both in the noiseless and noisy regime, allowing 
for exact and approximate signal recovery, respectively. For 
any sampling matrix <fr satisfying the RIP with a constant 
parameter independent of K, the SP algorithm can recover 
arbitrary Jf-sparse signals exactly from its noiseless mea- 
surements. When the measurements are inaccurate and/or the 
signal is not exactly sparse, the reconstruction distortion is 
upper bounded by a constant multiple of the measurement 
and/or signal perturbation energy. For very sparse signals 
with K < const • VN, which, for example, arise in certain 
communication scenarios, the computational complexity of the 
SP algorithm is upper bounded by O (mNK), but can be 
further reduced to O (mN log K) when the nonzero entries 
of the sparse signal decay slowly. 

The basic idea behind the SP algorithm is borrowed from 
coding theory, more precisely, the A* order-statistic algo- 
rithm IfTTI for additive white Gaussian noise channels. In 
this decoding framework, one starts by selecting the set of 
K most reliable information symbols. This highest reliability 
information set is subsequently hard-decision decoded, and 
the metric of the parity checks corresponding to the given 
information set is evaluated. Based on the value of this 
metric, some of the low-reliability symbols in the most reliable 
information set are changed in a sequential manner. The 
algorithm can therefore be seen as operating on an adaptively 
modified coding tree. If the notion of "most reliable symbol" 
is replaced by "column of sensing matrix exhibiting highest 
correlation with the vector y", the notion of "parity-check 
metric" by "residual metric", then the above method can be 
easily changed for use in CS reconstruction. Consequently, 
one can perform CS reconstruction by selecting a set of K 
columns of the sensing matrix with highest correlation that 
span a candidate subspace for the sensed vector. If the distance 
of the received vector to this space is deemed large, the 
algorithm incrementally removes and adds new basis vectors 
according to their reliability values, until a sufficiently close 



candidate word is identified. SP employs a search strategy in 
which a constant number of vectors is expurgated from the 
candidate list. This feature is mainly introduced for simplicity 
of analysis: one can easily extend the algorithm to include 
adaptive expurgation strategies that do not necessarily operate 
on fixed-sized lists. 

In compressive sensing, the major challenge associated with 
sparse signal reconstruction is to identify in which subspace, 
generated by not more than K columns of the matrix <fr, 
the measured signal y lies. Once the correct subspace is 
determined, the non-zero signal coefficients are calculated by 
applying the pseudoinversion process. The defining character 
of the SP algorithm is the method used for finding the K 
columns that span the correct subspace: SP tests subsets of 
K columns in a group, for the purpose of refining at each 
stage an initially chosen estimate for the subspace. More 
specifically, the algorithm maintains a list of K columns of €>, 
performs a simple test in the spanned space, and then refines 
the list. If y does not lie in the current estimate for the correct 
spanning space, one refines the estimate by retaining reliable 
candidates, discarding the unreliable ones while adding the 
same number of new candidates. The "reliability property" is 
captured in terms of the order statistics of the inner products 
of the received signal with the columns of <&, and the subspace 
projection coefficients. 

As a consequence, the main difference between ROMP and 
the SP reconstruction strategy is that the former algorithm 
generates a list of candidates sequentially, without back- 
tracing: it starts with an empty list, identifies one or several 
reliable candidates during each iteration, and adds them to 
the already existing list. Once a coordinate is deemed to be 
reliable and is added to the list, it is not removed from it 
until the algorithm terminates. This search strategy is overly 
restrictive, since candidates have to be selected with extreme 
caution. In contrast, the SP algorithm incorporates a simple 
method for re-evaluating the reliability of all candidates at 
each iteration of the process. 

At the time of writing this manuscript, the authors became 
aware of the related work by J. Tropp, D. Needell and R. Ver- 
shynin fl2l . describing a similar reconstruction algorithm. The 
main difference between the SP algorithm and the CoSAMP 
algorithm of lfl2l is in the manner in which new candidates are 
added to the list. In each iteration, in the SP algorithm, only K 
new candidates are added, while the CoSAMP algorithm adds 
2K vectors. This makes the SP algorithm computationally 
more efficient, but the underlying analysis more complex. In 
addition, the restricted isometry constant for which the SP 
algorithm is guaranteed to converge is larger than the one 
presented in fl2l . Finally, this paper also contains an analysis 
of the number of iterations needed for reconstruction of a 
sparse signal (see Theorem [6] for details), for which there is 
no counterpart in the CoSAMP study. 

The remainder of the paper is organized as follows. Sec- 
tion |n] introduces relevant concepts and terminology for de- 
scribing the proposed CS reconstruction technique. Section Hill 
contains the algorithmic description of the SP algorithm, along 
with a simulation-based study of its performance when com- 
pared with OMP, ROMP, and LP methods. SectionHVlcontains 
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the main result of the paper pertaining to the noiseless setting: 
a formal proof for the guaranteed reconstruction performance 
and the reconstruction complexity of the SP algorithm. Sec- 
tion [V] contains the main result of the paper pertaining to the 
noisy setting. Concluding remarks are given in Section [VI] 
while proofs of most of the theorems are presented in the 
Appendix of the paper. 

II. Preliminaries 

A. Compressive Sensing and the Restricted Isometry Property 

Let supp(x) denote the set of indices of the non-zero 
coordinates of an arbitrary vector x = (xj., . . . , xn), and let 
|supp(x)| = || ■ ||o denote the support size of x, or equivalently, 
its Iq normQ. Assume next that x G R w is an unknown signal 
with |supp(x)| < K, and let y G M. m be an observation of x 
via M linear measurements, i.e., 



ixN 



<I?X, 



is henceforth referred to as the sampling 



where $ G 
matrix. 

We are concerned with the problem of low-complexity 
recovery of the unknown signal x from the measurement y. 
A natural formulation of the recovery problem is within an l 
norm minimization framework, which seeks a solution to the 
problem 

min ||x|| subject to y = <I?x. 

Unfortunately, the above Iq minimization problem is NP-hard, 
and hence cannot be used for practical applications j3), J4J. 

One way to avoid using this computationally intractable for- 
mulation is to consider a ^-regularized optimization problem, 



min HxHj subject to y = <&x, 



where 



JV 

ll x lli = N 

i=l 

denotes the l\ norm of the vector x. 

The main advantage of the li minimization approach is that 
it is a convex optimization problem that can be solved effi- 
ciently by linear programming (LP) techniques. This method 
is therefore frequently referred to as li-LP reconstruction 0J, 
[13], and its reconstruction complexity equals O (m 2 7V 3//2 ) 
when interior point methods are employed [ 14] . See [ 15], 1 16 1, 
ifTTl for other methods to further reduce the complexity of l\- 
LP. 

The reconstruction accuracy of the li-LP method is de- 
scribed in terms of the restricted isometry property (RIP), 
formally defined below. 



Definition 1 (Truncation): Let 4> G 



prnxN 



x G 



pN 



and 



I C {1, • • • , N}. The matrix consists of the columns of 
<I? with indices i G I, and x/ is composed of the entries of x 
indexed by i € I. The space spanned by the columns of <I>/ 
is denoted by span («&/). 

Definition 2 (RIP): A matrix $ G R mxN is said to satisfy 
the Restricted Isometry Property (RIP) with parameters (K, 8) 

'We interchangeably use both notations in the paper. 



for K < m, < 8 < 1, If for all index sets J c {1, • • • , N} 
such that |/| < K and for all q G R' 7 ', one has 

(l-«)||q||2<ll*/q|l2<(l + *)llq|l2. 

We define 8k, the RIP constant, as the infimum of all 
parameters 8 for which the RIP holds, i.e. 

8 K := inf [8: (1 - 8) \\qf 2 < ||* jq ||» < (1 + 8) ||q|| 

K, VqGK 1 ' 1 }. 
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Remark 1 (RIP and eigenvalues): If a sampling matrix 
<fr G R mxN satisfies the RIP with parameters (K,8k), then 
for all I C {1, • • • , N} such that |/| < K, it holds that 

1 - 8 K < A mi „ (*}*/) < A max (#**,) < 1 + 8 K , 

where A m in (<&J<&j) and A max denote the minimal 

and maximal eigenvalues of <fr}3>/, respectively. 

Remark 2 (Matrices satisfying the RIP): Most known fam- 
ilies of matrices satisfying the RIP property with optimal or 
near-optimal performance guarantees are random. Examples 
include: 

1) Random matrices with i.i.d. entries that follow either 
the Gaussian distribution, Bernoulli distribution with 
zero mean and variance 1/n, or any other distribution 
that satisfies certain tail decay laws. It was shown in 
iTHl that the RIP for a randomly chosen matrix from 
such ensembles holds with overwhelming probability 
whenever 

K<C m 
~ log (N/m) ' 

where C is a function of the RIP constant. 

2) Random matrices from the Fourier ensemble. Here, 
one selects m rows from the N x N discrete Fourier 
transform matrix uniformly at random. Upon selection, 
the columns of the matrix are scaled to unit norm. The 
resulting matrix satisfies the RIP with overwhelming 
probability, provided that 

K < C- 



(logJV) 

where C depends only on the RIP constant. 
There exists an intimate connection between the LP recon- 
struction accuracy and the RIP property, first described by 
Candes and Tao in [3]. If the sampling matrix <I> satisfies the 
RIP with constants 8k, 82K, and ^3^, such that 



">2K 



83K < 1, 



(1) 



then the /i-LP algorithm will reconstruct all X-sparse signals 
exactly. This sufficient condition ([TJ can be improved to 



S 2K <V2-l, 



(2) 



as demonstrated in 1 1 8 1 . 

For subsequent derivations, we need two results summarized 
in the lemmas below. The first part of the claim, as well as a 
related modification of the second claim also appeared in 011 , 
ifTOl . For completeness, we include the proof of the lemma in 
Appendix [A] 
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Lemma 1 (Consequences of the RIP): 

1) (Monotonicity of 6k) For any two integers K < K' , 

5k < 5k>- 

2) (Near-orthogonality of columns) Let I, J C {1, • • ■ , N} 
be two disjoint sets, If] J — tfi. Suppose that £m_|_ij-i < 
1. For arbitrary vectors a S R' 7 ' and b £ R' J I, 

K* I a,#jb)|<J m+ , J ,||a|| a ||b|| 2) 

and 

\\** I * J b\\ 2 <6 w+m \\b\\ 2 . 

The lemma implies that 6k < #2K < ^3if> which conse- 
quently simplifies (fl]i to 5 3 k < 1/3. Both (fl} and (O represent 
sufficient conditions for exact reconstruction. 

In order to describe the main steps of the SP algorithm, we 
introduce next the notion of the projection of a vector and its 
residue. 

Definition 3 (Projection and Residue): Let y G R m and 
<!>/ 6 R mx l / L Suppose that is invertible. The projection 

of y onto span(<frj) is defined as 

y p = proj (y, *j) := */*|y, 

where 

:= l*}*/)- 1 *} 

denotes the pseudo-inverse of the matrix and * stands for 
matrix transposition. 

The residue vector of the projection equals 

y r = resid (y, */) := y - y p . 

We find the following properties of projections and residues 
of vectors useful for our subsequent derivations. 
Lemma 2 (Projection and Residue): 

1) (Orthogonality of the residue) For an arbitrary vector 
y e R m , and a sampling matrix 3>/ S M. mxK of full 
column rank, let y r = resid (y, <&/). Then 

*Jyr = o. 

2) (Approximation of the projection residue) Consider a 
matrix * e R mxAr . Let /, J C {l,---iV} be two 
disjoint sets, lC]J = <fi, and suppose that <5|/| + |j| < 1. 
Furthermore, let y £ span (<&/), y p = proj (y, $j) and 
y r = resid (y,*j). Then 

llyplla < i * m+|J| — W2. (3) 

1 — °max(|7|,|J|) 

and 

fi- , ' m+|J| )lly|l a <lly r || 2 <l|y|| a . (4) 

The proof of Lemma [2] can be found in Appendix [B] 



III. The SP Algorithm 
The main steps of the SP algorithm are summarized below@ 



Algorithm 1 Subspace Pursuit Algorithm 

Input: K, <&, y 

Initialization: 

1) T° = {K indices corresponding to the largest magni- 
tude entries in the vector <fr*y}. 

2) y° = resid (y, * f0 ). 

Iteration: At the £ th iteration, go through the following steps 

1) T e = T e ~ 1 {J{K indices corresponding to the largest 
magnitude entries in the vector $*y^ -1 }. 

2) Set x p = #t f y. 

3) T = {K indices corresponding to the largest elements 
of x p }. 

4) yf = resid (y,$ T *). 

5) If |[y*|| 2 > ||yf _1 || 2 , let T l = T l ~ x and quit the 
iteration. 

Output: 

1) The estimated signal x, satisfying x.n... ,n}-t* = 
and x T e = & Tt y. 



A schematic diagram of the SP algorithm is depicted in 
Fig. Q2t>). For comparison, a diagram of OMP-type methods 
is provided in Fig. Q2 a )- The subtle, but important, differ- 
ence between the two schemes lies in the approach used to 
generate T , the estimate of the correct support set T. In 
OMP strategies, during each iteration the algorithm selects 
one or several indices that represent good partial support 
set estimates and then adds them to T e . Once an index is 
included in T , it remains in this set throughout the remainder 
of the reconstruction process. As a result, strict inclusion 
rules are needed to ensure that a significant fraction of the 
newly added indices belongs to the correct support T. On 
the other hand, in the SP algorithm, an estimate T e of size 
K is maintained and refined during each iteration. An index, 
which is considered reliable in some iteration but shown to be 
wrong at a later iteration, can be added to or removed from the 
estimated support set at any stage of the recovery process. The 
expectation is that the recursive refinements of the estimate of 
the support set will lead to subspaces with strictly decreasing 
distance from the measurement vector y. 

We performed extensive computer simulations in order to 
compare the accuracy of different reconstruction algorithms 
empirically. In the compressive sensing framework, all sparse 
signals are expected to be exactly reconstructed as long as the 
level of the sparsity is below a certain threshold. However, 
the computational complexity to test this uniform reconstruc- 
tion ability is 0{N K ), which grows exponentially with K. 
Instead, for empirical testing, we adopt the simulation strategy 
described in J5] which calculates the empirical frequency of 

2 In Step 3) of the SP algorithm, K indices with the largest correlation 
magnitudes are used to form T l . In CoSaMP 1121, IK such indices are used. 
This small difference results in different proofs associated with Step 3) and 
different RIP constants that guarantee successful signal reconstruction. 
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New Iteration 




Correlation Cal. 



T t _ T e-i y{Several indices with 
the largest corr. magnitudes} 



(a) Iterations in OMP, Stagewise OMP, and Regularized OMP: in each 
iteration, one decides on a reliable set of candidate indices to be added 
into the list T*" 1 ; once a candidate is added, it remains in the list until the 
algorithm terminates. 



New Iteration Correlation Cal 





U{K 


ndices with 


the 1 


argest corr. 


magnitudes} 




proj (y, <b T t) 



T e = {K indices with 
the largest proj. coefficients} 



Proj. coefficients Xp 



(b) Iterations in the proposed Subspace Pursuit Algorithm: a list of K can- 
didates, which is allowed to be updated during the iterations, is maintained. 

Figure 1: Description of reconstruction algorithms for K- 
sparse signals: though both approaches look similar, the basic 
ideas behind them are quite different. 



exact reconstruction for the Gaussian random matrix ensemble. 
The steps of the testing strategy are listed below. 

1) For given values of the parameters m and N, choose a 
signal sparsity level K such that K < m/2; 

2) Randomly generate a m x N sampling matrix <fr from 
the standard i.i.d. Gaussian ensemble; 

3) Select a support set T of size |T| = K uniformly at 
random, and generate the sparse signal vector x by either 
one of the following two methods: 

a) Draw the elements of the vector x restricted to T 
from the standard Gaussian distribution; we refer 
to this type of signal as a Gaussian signal. Or, 

b) set all entries of x supported on T to ones; we 
refer to this type of signal as a zero-one signal. 

Note that zero-one sparse signals are of special interest 
for the comparative study, since they represent a partic- 
ularly challenging case for OMP-type of reconstruction 
strategies. 

4) Compute the measurement y = 3>x, apply a recon- 
struction algorithm to obtain x, the estimate of x, and 
compare x to x; 

5) Repeat the process 500 times for each K, and then 
simulate the same algorithm for different values of m 
and N. 

The improved reconstruction capability of the SP method, 
compared with that of the OMP and ROMP algorithms, is 
illustrated by two examples shown in Fig. [2] Here, the signals 
are drawn both according to the Gaussian and zero-one model, 
and the benchmark performance of the LP reconstruction 
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(a) Simulations for Gaussian sparse signals: OMP and ROMP start to fail when 
K > 19 and when K > 22 respectively, £i-LP begins to fail when K > 35, 
and the SP algorithm fails only when K > 45. 
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(b) Simulations for zero-one sparse signals: both OMP and ROMP starts to 
fail when K > 10, ^i-LP begins to fail when K > 35, and the SP algorithm 
fails when K > 29. 

Figure 2: Simulations of the exact recovery rate: compared 
with OMPs, the SP algorithm has significantly larger critical 
sparsity. 



technique is plotted as well. 

Figure|2]depicts the empirical frequency of exact reconstruc- 
tion. The numerical values on the x-axis denote the sparsity 
level K, while the numerical values on the j/-axis represent 
the fraction of exactly recovered test signals. Of particular 
interest is the sparsity level at which the recovery rate drops 
below 100% - i.e. the critical sparsity - which, when exceeded, 
leads to errors in the reconstruction algorithm applied to some 
of the signals from the given class. 

The simulation results reveal that the critical sparsity of 
the SP algorithm by far exceeds that of the OMP and ROMP 
techniques, for both Gaussian and zero-one inputs. The re- 
construction capability of the SP algorithm is comparable to 
that of the LP based approach: the SP algorithm has a slightly 
higher critical sparsity for Gaussian signals, but also a slightly 
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lower critical sparsity for zero-one signals. However, the SP 
algorithms significantly outperforms the LP method when 
it comes to reconstruction complexity. As we analytically 
demonstrate in the exposition to follow, the reconstruction 
complexity of the SP algorithm for both Gaussian and zero-one 
sparse signals is O (mN log K), whenever K < O (^/ISfj, 
while the complexity of LP algorithms based on interior point 
methods is O (m 2 iV 3 / 2 ) lfT4ll in the same asymptotic regime. 



IV. Recovery of Sparse Signals 

For simplicity, we start by analyzing the reconstruction 
performance of SP algorithms applied to sparse signals in 
the noiseless setting. The techniques used in this context, and 
the insights obtained are also applicable to the analysis of 
SP reconstruction schemes with signal or/and measurement 
perturbations. Note that throughout the remainder of the paper, 
we use the notation Si (S £ {D, L}, i E Z + ) stacked over 
an inequality sign to indicate that the inequality follows from 
Definition(£>) or Lemma (L) i in the paper. 

A sufficient condition for exact reconstruction of arbitrary 
sparse signals is stated in the following theorem. 

Theorem 1: Let x € be a if-sparse signal, and let 
its corresponding measurement be y = <&x £ M. m . If the 
sampling matrix <& satisfies the RIP with constant 



6 3K < 0.165, 



(5) 



then the SP algorithm is guaranteed to exactly recover x from 
y via a finite number of iterations. 



Remark 3: The requirement on RIP constant can be relaxed 



to 



8 3K < 0.205, 



if we replace the stopping criterion \\yf 



< 



with 



3V 2 = 0' ^his c l aml i s supported by substituting 
S 3 k < 0.205 into Equation ©. However, for simplicity of 



analysis, we adopt ||y, 
criterion. 



< 



for the iteration stopping 



Remark 4: In the original version of this manuscript, we 
proved the weaker result S 3 k < 0.06. At the time of revision 
of the paper, we were given access to the manuscript [19| by 
Needel and Tropp. Using some of the proof techniques in their 
work, we managed to improve the results in Theorem [3] and 
therefore the RIP constant of the original submission. The in- 
terested reader is referred to http://arxiv.org/abs/0803.0811v2 
for the first version of the theorem. This paper contains only 
the proof of the stronger result. 

This sufficient condition is proved by applying Theorems [2] 
and [6] The computational complexity is related to the number 
of iterations required for exact reconstruction, and is discussed 
at the end of Section llV-CI Before providing a detailed analysis 
of the results, let us sketch the main ideas behind the proof. 

We denote by x T _ T e-i and x T _ T * the residual signals 
based upon the estimates of supp(x) before and after the 
£ th iteration of the SP algorithm. Provided that the sampling 
matrix satisfies the RIP with constant (0, it holds that 

||x T _ T /|L < ||x T _ T *-i|L , 




Figure 3: After each iteration, a if -dimensional hyper-plane 
closer to y is obtained. 



which implies that at each iteration, the SP algorithm identifies 
a if -dimensional space that reduces the reconstruction error 
of the vector x. See Fig. [3] for an illustration. This observation 
is formally stated as follows. 

Theorem 2: Assume that the conditions of Theorem[T]hold. 
For each iteration of the SP algorithm, one has 



|x T _ T *|| 2 < ck ||x t _ t *-i| 



and 



where 



r ni2 



< 



ck 



1 - 2S 3K 



yv 9 y v 1 



ck 



2S 3K (l + S 3K ) 
(l-S 3K f 



(6) 



(7) 



(8) 



To prove Theorem [2] we need to take a closer look at the 
operations executed during each iteration of the SP algorithm. 
During one iteration, two basic sets of computations and 
comparisons are performed: first, given T 1 ^ 1 , K additional 
candidate indices for inclusion into the estimate of the support 
set are identified; and second, given T , K reliable indices out 
of the total 2K indices are selected to form T . In Subsections 
IIV-AI and IIV-BI we provide the intuition for choosing the se- 
lection rules. Now, let x T ^ f be the residue signal coefficient 
vector corresponding to the support set estimate T e . We have 
the following two theorems. 

Theorem 3: It holds that 



< 



25. 



3K 



(1 - s 3K y 



||x T _ T f- 



2 ■ 



The proof of the theorem is postponed to Appendix ID] 
Theorem 4: The following inequality is valid 



\'X-j'—T e 



1 + SsK „ 

' 2 " llx 



The proof of the result is deferred to Appendix [E] 
Based on Theorems [3] and |4] one arrives at the result claimed 
in Equation (0. 
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Furthermore, according to Lemmas [TJ and |2l one has 

||yf|| 2 = l|resid(y,* T *)|| 2 

= || resid ($ r _ r (X T _ T ( , $ T «) 
+resid ($ T ix T (, 3> T f)|| 2 

^^||rcsid ($ T _ T (X T _ r <, $ T «) + 0|| 2 

< || <&y_-p{Xy_y« || 2 

© / 

< V 1 + <Jk • ||x t _ t «-i|| 2 , 



(9) 



where the second equality holds by the definition of the 
residue, while (4) and (6) refer to the labels of the inequalities 
used in the bounds. In addition, 



|resid (y, * T /-i)|| 2 

jresid ($ T _ T nx T _ r n , & T , 



(4) 



1 - 5 K - 5 2 k | 

> - || *P'7 1 _7 1 £-lX7 1 _J'£— 1 | 



> 



> 



1 -5 K 
1 - 2S 2K 

l-S K 
1 - 2S 2K 



|X T _ T f-l| 



(10) 



Upon combining (0 and ([Tot , one obtains the following upper 
bound 



V < 



c k \\y r 



l 



1 - 25; 



-ck \\y r 



3K 

Finally, elementary calculations show that when S$k < 0.165, 

1 - 2d 3K 

which completes the proof of Theorem [2] 

A. Why Does Correlation Maximization Work for the SP 
Algorithm? 

Both in the initialization step and during each iteration 
of the SP algorithm, we select K indices that maximize 
the correlations between the column vectors and the residual 
measurement. Henceforth, this step is referred to as correlation 
maximization (CM). Consider the ideal case where all columns 
of *& are orthogonafl In this scenario, the signal coefficients 
can be easily recovered by calculating the correlations (vj, y) 
- i.e., all indices with non-zero magnitude are in the correct 
support of the sensed vector. Now assume that the sampling 
matrix 3? satisfies the RIP. Recall that the RIP (see Lemma 
[TJ implies that the columns are locally near-orthogonal. Con- 
sequently, for any j not in the correct support, the magnitude 
of the correlation (vj , y) is expected to be small, and more 
precisely, upper bounded by 5k+i ||x|| 2 . This seems to provide 
a very simple intuition why correlation maximization allows 
for exact reconstruction. However, this intuition is not easy 

3 Of course, in this case no compression is possible. 



to analytically justify due to the following fact. Although it 
is clear that for all indices j ^ T, the values of | (v 3 , y) | are 
upper bounded by 5k+i ||x||, it may also happen that for all 
i G T, the values of |(vj,y)| are small as well. Dealing with 
maximum correlations in this scenario cannot be immediately 
proved to be a good reconstruction strategy. The following 
example illustrates this point. 

Example 1: Without loss of generality, let 
T = {1, • • • , K}. Let the vectors v,; (i E T) be orthonormal, 
and let the remaining columns v 3 , j ^ T, of $ be constructed 
randomly, using i.i.d. Gaussian samples. Consider the 
following normalized zero-one sparse signal 

Then, for K sufficiently large, 
1 



( v *>y; 



< 1, for all 1 < i < K. 



It is straightforward to envision the existence of an index j 
T, such that 

l(v,-,y}| w 5 K +i > 



K 

The latter inequality is critical, because achieving very small 
values for the RIP constant is a challenging task. 

This example represents a particularly challenging case for 
the OMP algorithm. Therefore, one of the major constraints 
imposed on the OMP algorithm is the requirement that 



max|(v;,y)| = — L > max|(v 3 ,y) 



To meet this requirement, 5k+i has to be less than l/\/K, 
which decays fast as K increases. 

In contrast, the SP algorithm allows for the existence of 
some index j ^ T with 

rnax|(vi,y)| < |(v,-,y)|. 

As long as the RIP constant 5^k is upper bounded by the 
constant given in (0, the indices in the correct support of 
x, that account for the most significant part of the energy 
of the signal, are captured by the CM procedure. Detailed 
descriptions of how this can be achieved are provided in the 
proofs of the previously stated Theorems [3] and [5] 

Let us first focus on the initialization step. By the definition 
of the set T° in the initialization stage of the algorithm, the 
set of the K selected columns ensures that 

ll*Toy|| 2 > ||^y|| 2 ?(l-MH 2 . (11) 

Now, if we assume that the estimate T° is disjoint from 
the correct support, i.e., that T° C]T = cf>, then by the near 
orthogonality property of Lemma [TJ one has 

ll*Toy|| 2 = ll*T°*TX T || 2 < 5 2K ||x|| 2 . 

The last inequality clearly contradicts (TTTb whenever 5k < 
&2K < 1/2. Consequently, if 5 2 k < 1/2, then 



and at least one correct element of the support of x is in T°. 
This phenomenon is quantitatively described in Theorem [5] 
Theorem 5: After the initialization step, one has 

1- 8k~ 2S2K „ „ 



Tf\f- 



|x T o nT || 2 > 



1 



2 ' 



and 



||x T _ T o|L < 



2K 



] 2K 



1 + 8; 



2 A' 



The proof of the theorem is postponed to Appendix ICl 
To study the effect of correlation maximization during each 

iteration, one has to observe that correlation calculations are 

performed with respect to the vector 

yf" 1 = resid(y,$ T f-i) 

instead of being performed with respect to the vector y. 
As a consequence, to show that the CM process captures a 
significant part of residual signal energy requires an analysis 
including a number of technical details. These can be found 
in the Proof of Theorem [3] 

B. Identifying Indices Outside of the Correct Support Set 

Note that there are 2K indices in the set T e , among which 
at least K of them do not belong to the correct support set T. 
In order to expurgate those indices from T , or equivalently, 
in order to find a if -dimensional subspace of the space 
span (Qfi ) closest to y, we need to estimate these K incorrect 
indices. 

Define AT := T l — T 1 ' 1 . This set contains the K indices 
which are deemed incorrect. If AT f] T = <f>, our estimate of 
incorrect indices is perfect. However, sometimes AT |~) T ^ <f>. 
This means that among the estimated incorrect indices, there 
are some indices that actually belong to the correct support set 
T. The question of interest is how often these correct indices 
are erroneously removed from the support estimate, and how 
quickly the algorithm manages to restore them back. 

We claim that the reduction in the ||-|| 2 norm introduced by 
such erroneous expurgation is small. The intuitive explanation 
for this claim is as follows. Let us assume that all the 
indices in the support of x have been successfully captured, or 
equivalently, that T C T e . When we project y onto the space 
span (3>f.<>)> it can b e snown that its corresponding projection 
coefficient vector x p satisfies 



and that it contains at least K zeros. Consequently, the K 
indices with smallest magnitude - equal to zero - are clearly 
not in the correct support set. 

However, the situation changes when T ^ T e , or equiva- 
lently, when T — T e ^ <p. After the projection, one has 



for some nonzero e e 



View the projection coefficient 



vector x p as a smeared version of Xfe (see Fig. H] for 
illustration): the coefficients indexed by i ^ T may become 
non-zero; the coefficients indexed by i G T may experience 



1 f } 



+ 



smear 




Tf\AT 



■ 1 

■ 1 

■ 1 



J_J_ 



t 



t t t 



AT 



Figure 4: The projection coefficient vector x p is a smeared 
version of the vector x T p T , . 



changes in their magnitudes. Fortunately, the energy of this 
smear, i.e., ||e|| 2 , is proportional to the norm of the residual 
signal x T _f e , which can be proved to be small according to 
the analysis accompanying Theorem [3] As long as the smear 
is not severe, x p w ~x.fi, one should be able to obtain a 
good estimate of T |~| T e via the largest projection coefficients. 
This intuitive explanation is formalized in the previously stated 
Theorem [4] 

C. Convergence of the SP Algorithm 

In this subsection, we upper bound the number of iterations 
needed to reconstruct an arbitrary If -sparse signal using the 
SP algorithm. 

Given an arbitrary If -sparse signal x, we first arrange its 
elements in decreasing order of magnitude. Without loss of 
generality, assume that 

\xi\ > \x 2 \ >■■■> \x K \ > 0, 
and that Xj =0, V j > K. Define 



\xk\ 



mm Xj 

Ki<K 



(12) 



Let n; t denote the number of iterations of the SP algorithm 
needed for exact reconstruction of x. Then the following 
theorem upper bounds n.;t in terms of ck and p m i a . It can be 
viewed as a bound on the complexity/performance trade-off 
for the SP algorithm. 

Theorem 6: The number of iterations of the SP algorithm 
is upper bounded by 



n; t < min 



- log p mir , 
- log C K 



1, 



1.5 • K 

- log c K 



This result is a combination of Theorems [7] and (fT2l) l 4 l 
described below. 

4 The upper bound in Theorem [7] is also obtained in 1121 while the one in 
Theorem \§\ is not. 
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Theorem 7: One has 

n it < 



logp n 



Theorem 8: It can be shown that 

1.5 K 

Hit < — j ■ 

- log C K 

The proof of Theorem [JJ is intuitively clear and presented 
below, while the proof of Theorem [8] is more technical and 
postponed to Appendix [FJ 

Proof of Theorem [7} The theorem is proved by contra- 
diction. Consider T e , the estimate of T, with 



/ = 



log A, 



+ 1 



- log C K 

Suppose that T £ T e , or equivalently, T - T e ^ 



Then 



l x T-HI 2 



2 • 



/ E *? 

> nun it = pmh 
However, according to Theorem [2] 

||x T _ T *|| 2 < {c K f ||x|| 2 

^ Pmin 1 1 x| | 2 , 

where the last inequality follows from our choice of I such 
that (ck) 1 < Pmin- This contradicts the assumption T ^ T e 
and therefore proves Theorem [JJ ■ 

A drawback of TheoremUJis that it sometimes overestimates 
the number of iterations, especially when p mm <C 1. The 
example to follow illustrates this point. 

Example 2: Let K = 2, xi = 2 10 , x<i = 1, X3 = ■ ■ ■ = 
xn = 0. Suppose that the sampling matrix $ satisfies the RIP 
with ck = \- Noting that p m i n < 2~ 10 , Theorem [6] implies 
that 

na < 11. 

Indeed, if we take a close look at the steps of the SP algorithm, 
we can verify that 

n it < 1. 

After the initialization step, by Theorem [5] it can be shown 
that 



l|x T - T o|| 2 < 8 ^ Hx|| 2 <0.95||x|| 2 . 

1 + 02K 

As a result, the estimate T° must contain the index one and 
||x T _ T o|| 2 < 1. After the first iteration, since 

||x<r_Ti|| 2 < ck ||xx-t°|| < 0.95 < min \xt 
we have T C T 1 . 



This example suggests that the upper bound (UJ can be 
tightened when the signal components decay fast. Based on 



the idea behind this example, another upper bound on n- lt is 
described in Theorem [8] and proved in Appendix [F] 

It is clear that the number of iterations required for exact re- 
construction depends on the values of the entries of the sparse 
signal. We therefore focus our attention on the following three 
particular classes of sparse signals. 

1) Zero-one sparse signals. As explained before, zero-one 
signals represent the most challenging reconstruction 
category for OMP algorithms. However, this class of 
signals has the best upper bound on the convergence 
rate of the SP algorithm. Elementary calculations reveal 
that pmin = 1/ yf and that 

logiC 



2l0g(l/CA-)' 



2) Sparse signals with power-law decaying entries (also 
known as compressible sparse signals). Signals in this 
category are defined via the following constraint 

\%i I — c x i ^5 

for some constants c x > and p > 1. Compressible 
sparse signals have been widely considered in the CS 
literature, since most practical and naturally occurring 
signals belong to this class fl3l . It follows from Theo- 
rem UJ that in this case 

p log K 



log(l/Cif) 

where o ( 1 ) — > when K — ► < 



(l + o(l)), 



3) Sparse signals with exponentially decaying entries. Sig- 
nals in this class satisfy 



\Xi\ < Cr 



(13) 



for some constants c x > and p > 0. Theorem[6]implies 
that 



pK 



< J log(l/c K ) 
- I 1.5K 
{ log(l/c K ) 



if <p< 1.5 
if p > 1.5 



where again o (1) — ► as K — > 00. 
Simulation results, shown in Fig. [5] indicate that the above 
analysis gives the right order of growth in complexity with 
respect to the parameter K. To generate the plots of Fig. 
[5] we set m = 128, N = 256, and run simulations for 
different classes of sparse signals. For each type of sparse 
signal, we selected different values for the parameter K, and 
for each K, we selected 200 different randomly generated 
Gaussian sampling matrices $ and as many different support 
sets T. The plots depict the average number of iterations 
versus the signal sparsity level K, and they clearly show that 
"it = O (log (K)) for zero-one signals and sparse signals 
with coefficients decaying according to a power law, while 
n it = O (K) for sparse signals with exponentially decaying 
coefficients. 

With the bound on the number of iterations required for 
exact reconstruction at hand, the computational complexity of 
the SP algorithm can be easily estimated: it equals the com- 
plexity of one iteration multiplied by the number of iterations. 
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m=128, N=256, # of realizations=200 




' ' ' ' ' ' 

5 10 15 20 25 30 

K 

Figure 5: Convergence of the subspace pursuit algorithm for 
different signals. 

In each iteration, CM requires mN computations in general. 
For some measurement matrices with special structures, for ex- 
ample, sparse matrices, the computational cost can be reduced 
significantly. The cost of computing the projections is of the 
order of O (K 2 m), if one uses the Modified Gram-Schmidt 
(MGS) algorithm | 20l pg. 61]. This cost can be reduced 
further by "reusing" the computational results of past iterations 
within future iterations. This is possible because most practical 
sparse signals are compressible, and the signal support set 
estimates in different iterations usually intersect in a large 
number of indices. Though there are many ways to reduce 
the complexity of both the CM and projection computation 
steps, we only focus on the most general framework of the SP 
algorithm, and assume that the complexity of each iteration 
equals O (mN + mK 2 y As a result, the total complexity 
of the SP algorithm is given by O (m (N + K 2 } logi"T) for 
compressible sparse signals, and it is upper bounded by 
O (m (N + K 2 ) K) for arbitrary sparse signals. When the 
signal is very sparse, in particular, when K 2 < O (N), the 
total complexity of SP reconstruction is upper bounded by 
O (mNK) for arbitrary sparse signals and by O (mN log K) 
for compressible sparse signals (we once again point out that 
most practical sparse signals belong to this signal category 

El). 

The complexity of the SP algorithm is comparable to OMP- 
type algorithms for very sparse signals where K 2 < O(N). 
For the standard OMP algorithm, exact reconstruction always 
requires K iterations. In each iteration, the CM operation costs 
O (mN) computations and the complexity of the projection 
is marginal compared with the CM. The corresponding total 
complexity is therefore always O (mNK). For the ROMP 
and StOMP algorithms, the challenging signals in terms of 
convergence rate are also the sparse signals with exponentially 
decaying entries. When the p in ( fT3l l is sufficiently large, it can 
be shown that both ROMP and StOMP also need O (K) iter- 
ations for reconstruction. Note that CM operation is required 
in both algorithms. The total computational complexity is then 
O (mNK). 



The case that requires special attention during analysis 
is K 2 > O (N). Again, if compressible sparse signals are 
considered, the complexity of projections can be significantly 
reduced if one reuses the results from previous iterations at the 
current iteration. If exponentially decaying sparse signals are 
considered, one may want to only recover the energetically 
most significant part of the signal and treat the residual of 
the signal as noise — reduce the effective signal sparsity 
to K' <C K. In both cases, the complexity depends on the 
specific implementation of the CM and projection operations 
and is beyond the scope of analysis of this paper. 

One advantage of the SP algorithm is that the number of 
iterations required for recovery is significantly smaller than 
that of the standard OMP algorithm for compressible sparse 
signals. To the best of the authors' knowledge, there are no 
known results on the number of iterations of the ROMP and 
StOMP algorithms needed for recovery of compressible sparse 
signals. 

V. Recovery of Approximately Sparse Signals 
from Inaccurate Measurements 

We first consider a sampling scenario in which the signal 
x is if-sparse, but the measurement vector y is subjected to 
an additive noise component, e. The following theorem gives 
a sufficient condition for convergence of the SP algorithm in 
terms of the RIP constant 5$k, as well as an upper bounds on 
the recovery distortion that depends on the energy (^-norm) 
of the error vector e. 

Theorem 9 (Stability under measurement perturbations): 
Let x e l w be such that |supp(x)| < K, and let its 
corresponding measurement be y = <&x + e, where e denotes 
the noise vector. Suppose that the sampling matrix satisfies 
the RIP with parameter 

5 3K < 0.083. (14) 
Then the reconstruction distortion of the SP algorithm satisfies 

||x-x|| 2 <c' K \\e\\ 2 , 

where 

, = 1 + S 3K + She 

$3K (1 - 5zk) 

The proof of the theorem is given in Section IV-AI 
We also study the case where the signal x is only approxi- 
mately X-sparse, and the measurement y is contaminated by 
a noise vector e. To simplify the notation, we henceforth use 
x#- to denote the vector obtained from x by maintaining the 
K entries with largest magnitude and setting all other entries 
in the vector to zero. In this setting, a signal x is said to be 
approximately if-sparse if x — xa" 7^ 0. Based on Theorem 
[9] we can upper bound the recovery distortion in terms of the 
li and I2 norms of x — x^- and e, respectively, as follows. 

Corollary 1: (Stability under signal and measurement per- 
turbations) Let x S be approximately iC-sparse, and let 
y = $x + e. Suppose that the sampling matrix satisfies the 
RIP with parameter 

S 6K < 0.083. 
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Then 



Recovery Distortion (500 Realizations): m=128, N=256 



la — C 2K 



06K 



The proof of this corollary is given in Section IV-BI As 
opposed to the standard case where the input sparsity level of 
the SP algorithm equals the signal sparsity level K, one needs 
to set the input sparsity level of the SP algorithm to 2K in 
order to obtain the claim stated in the above corollary. 

Theorem|9]and CorollaryQ]provide analytical upper bounds 
on the reconstruction distortion of the noisy version of the 
SP algorithm. In addition to these theoretical bounds, we 
performed numerical simulations to empirically estimate the 
reconstruction distortion. In the simulations, we first select the 
dimension N of the signal x, and the number of measurements 
m. We then choose a sparsity level K such that K < m/2. 
Once the parameters are chosen, anmxJY sampling matrix 
with standard i.i.d. Gaussian entries is generated. For a given 
K, the support set T of size \T\ = K is selected uniformly 
at random. A zero-one sparse signal is constructed as in 
the previous section. Finally, either signal or a measurement 
perturbations are added as follows: 

1) Signal perturbations: the signal entries in T are kept un- 
changed but the signal entries outside of T are perturbed 
by i.i.d. Gaussian Af (0, cx 2 ) samples. 

2) Measurement perturbations,: the perturbation vector e is 
generated using a Gaussian distribution with zero mean 
and covariance matrix cr 2 I m , where I m denotes the m x 
to identity matrix. 

We ran the SP reconstruction process on y, 500 times for 
each K, rr 2 and a\. The reconstruction distortion ||x — x|| 2 
is obtained via averaging over all these instances, and the 
results are plotted in Fig. [6] Consistent with the findings of 
Theorem [9] and Corollary [T] we observe that the recovery dis- 
tortion increases linearly with the /2-norm of the measurement 
error. Even more encouraging is the fact that the empirical 
reconstruction distortion is typically much smaller than the 
corresponding upper bounds. This is likely due to the fact that, 
in order to simplify the expressions involved, many constants 
and parameters used in the proof were upper bounded. 

A. Recovery Distortion under Measurement Perturbations 

The first step towards proving Theorem|9]is to upper bound 
the reconstruction error for a given estimated support set T, 
as succinctly described in the lemma to follow. 

Lemma 3: Let x S M. N be a if -sparse vector, ||x|| < K, 
and let y = <l>x + e be a measurement for which <I> G W n x N 
satisfies the RIP with parameter 5k- For an arbitrary T C 



{1, • • • , N} such that 

and 
Then 



T 



< K, define x as 



x f = *ty, 



X {1,— ,N}-f ~ 0. 



I*-*II 2 < — 



1 + 6. 



3K 



3K 



J.) 

e 

jo J 1 

- 

it' 



Approx Sparse Signal : K=20 
— * — Approx Sparse Signal : K=5 
O Noisy Measurement : K=20 

Noisy Measurement : K=10 
- -it - Noisy Measurement : K=5 



10- 



10 

Perturbation Level 



Figure 6: Reconstruction distortion under signal or measure- 
ment perturbations: both perturbation level and reconstruction 
distortion are described via the I2 norm. 



The proof of the lemma is given in Appendix iGl 
Next, we need to upper bound the norm ||x T _ T i|| 2 in the 
£ th iteration of the SP algorithm. To achieve this task, we 
describe in the theorem to follow how ||x T _ T «|| 9 depends on 
the RIP constant and the noise energy ||e|| 2 . 
Theorem 10: It holds that 

2S 3K „ „ , 2(l + S 3K ) 

~2 ll x T-T«-i|| 2 



T— 



< 



(i - 5 3K y 



1 



■J3K 



\\X-T-T' || 2 ^ 



1 



J3K 



03K 



2 > 



"II2 ' 
(15) 
(16) 



and therefore, 

|| X T-T«|U < 



26 3K (1 + S 3K ) 



(1 - 03K) 

Furthermore, suppose that 

||e|| 2 < S 3K ||x T _ Tf -i|| 2 . 



4(l + ft»c) 

' (i - s 3K f 



(17) 
(18) 



Then one has 



whenever 



y 7" 9 y V 



5 3K < 0.083. 



3K 



Proof: The upper bounds in Inequalities (fT~5T > and ( [ToT l are 
proved in Appendix [Hi and HI respectively. The inequality (V7[ 
is obtained by substituting (fT5T > into ( TToT ) as shown below: 

n n , 2S 3K (1 + S 3K ) n „ 
||x T _ T ^|| 2 < — — -3— \\-x T _ T t-i\\ 2 

(1 - o 3 k) 

2(1 + S 3 k) 2 + 2(1-S 3K ) ., 

+ 7, ^2 ll e ll 2 

(1 - ha) 

28 3K (1 + 8 3K ) 

(1 - o 3 k) 
4(1 + far) „„„ 
(1 - o 3K ) 
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To complete the proof, we make use of Lemma [2] stated in 
Section |TT] According to this lemma, we have 



|y£|| 2 = l|resid(y,* T i)|| 2 

< ||resid x T _ T « , $ T <?)|| 2 

42 

< ||* T _ Tf x T _ Tf || 2 + ||e|| 2 
' llelL, 



|resid (e, <f> T e 



< + S 3K ||x T _ T f | 



(19) 



and 



K r 1 1 53 



= ||resid(y,* T /-i)|| 2 
> ||resid (#y_yf-ix T _yf-i , 
-• ||resid(e,* T *-i)|| 2 

2<$3K 



mi 

> 



> 



1 - 


- fax 


1 - 


2S 3 k 


1 - 




1 - 





^T— T f ~ lX T — T f_1 II2 — !l e ll2 



y/l — $3K ||x T _ T *-i I 



(20) 



yl - o 3K 

Apply the inequalities (O and dTHJ to £[9]l and <[20j. Nu- 
merical analysis shows that as long as 63K < 0.085, the 
right hand side of ( TT9i l is less than that of ( f20b and therefore 



This completes the proof of the theorem. 



Based on Theorem [TO] we conclude that when the SP 
algorithm terminates, the inequality < TT~8T > is violated and we 
must have 

||e|| 2 > 8 3K ||x T _ T «-i|| 2 . 
Under this assumption, it follows from Lemma [3] that 

1 1 , i + s 3K 



< 



1 — 5 3 K 5 3 K 

1 + S 3K + 5? 



1-5, 



3K 



3K 



5 3 K (1 - 5 3 k) 
which completes the proof of Theorem [9] 



B. Recovery Distortion under Signal and Measurement Per- 
turbations 

The proof of Corollary Q] is based on the following two 
lemmas, which are proved in |21| and [22 1, respectively. 

Lemma 4: Suppose that the sampling matrix 3? S R mxJV 
satisfies the RIP with parameter 8k- Then, for every x S 
one has 

1 



|*x|| a < ^l + 8 K ||x|| 2 + — 



K 



Lemma 5: Let x E M. d be if -sparse, and let Xjf denote the 
vector obtained from x by keeping its K entries of largest 
magnitude, and by setting all its other components to zero. 
Then 

11 11 ^ I'*" 1 

l|x-x K || 2 < 



2VK 



To prove the corollary, consider the measurement vector 

y = $x + e 
= *x 2 k + * (x - x 2A -) + e. 

By Theorem |9j one has 

||x - x 2if || 2 < C 6K (II* (x - x 2 k)|| 2 + ||e|| 2 ) , 
and invoking Lemma H] shows that 
||*(x-x 2if )|| 2 

]|x - x 2if | 



< V 1 + S 6K M|x-X 2if || 2 + 

Furthermore, Lemma [5] implies that 



|x-X 2Jr || 2 = \\(X-X K ) - (X-XL K ) K \\ 2 
< = ||X - XA'Hx • 



2-JK 



Therefore, 



||*(x-x 2Ar )|| a 



X - Xjf 



< y/1 + 6, 



6K- 



2VK 



x - x 2 g| 



and 



x-xk 



A' 



||x-X 2 k|| 2 < 4k (jl e ll 2 + Vl+^6K 

which completes the proof. 

VI. Conclusion 



We introduced a new algorithm, termed subspace pursuit, 
for low-complexity recovery of sparse signals sampled by ma- 
trices satisfying the RIP with a constant parameter 5 3 k- Also 
presented were simulation results demonstrating that the re- 
covery performance of the algorithm matches, and sometimes 
even exceeds, that of the LP programming technique; and, 
simulations showing that the number of iterations executed 
by the algorithm for zero-one sparse signals and compressible 
signals is of the order 0(log A). 
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Appendix 

We provide next detailed proofs for the lemmas and theo- 
rems stated in the paper. 
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A. Proof of Lemma Q] 

1) The first part of the lemma follows directly from the 
definition of 8k ■ Every vector q e R K can be extended 
to a vector q' € WL K by attaching K' — K zeros to it. 
From the fact that for all J C {1, • • • , N} such that 
| J| < K', and all q' e R K ' , one has 

(l-S K ')\\q!\\l<\\^jq!\\l<(l + S K ')\ml 
it follows that 

(i-MI|q|l2<ll*/q|l2<(i + ^')llq|l2 

for all |/| < K and q € R . Since is defined as 
the infimum of all parameters 8 that satisfy the above 
inequalities, 8 k < 8k>- 

2) The inequality 

|<*/a,*, 7 b)| <«S m +|J|N 2 l|b|| 2 

obviously holds if either one of the norms ||a|| 2 and 
||b|| 2 is zero. Assume therefore that neither one of them 
is zero, and define 

a' = a/||a|| a ,b' = b/||b|| 2 , 
x' = $/a, y' = <I?,/b. 



Note that the RIP implies that 
2(1 



<Wi./i) < II*'- 



'II 2 

y ll 2 



[*,*;] 



a' 
b' 



< 2(1 + <5 



\i\+\J\) ' 



(21) 



and similarly, 

2(l-8\ 



7| + |.7|)<||x'-y'||^ 
-b' 



< 2 1 



V\+\J\. 



We thus have 

(x',y') = 



\ X '+y'\\ 2 2 -\\ X '-y'\\l <5 



(x',y') = 



\i\+\Jh 



\ X i-yi\\ 2 2 -\\xi + yi\\ 2 2 

A ^ d \I\ + \J\> 



and therefore 

|(*ia,*jb)| 



!|a|| 2 ||b|| 2 



\(x',y')\<8 m+lJl . 



Now, 



||#J*jb|| 2 = max |q*(#J*jb)| 

q: Ilq|l2 =1 

< max Sm+m ||q|| 2 ||b|| 2 

q : Ilq|l2 =1 

= 5 \i\+\J\ ll b ll 2 . 
which completes the proof. 



B. Proof of Lemma \2\ 

1) The first claim is proved by observing that 

*^y r = (y - */ *}y 
= <£}y - <£}y = 0. 

2) To prove the second part of the lemma, let 

y p = *jx p , and y = */x. 
By Lemma [1] we have 

\<y p ,y)\ = \(*jxp,*ix)\ 

< s \i\+\j\ ll x P ll 2 ll x ll 2 

si Italia llyL 



< 



Italia h\\2 



1 - £max(|/|,|J|) 

On the other hand, the left hand side of the above 
inequality reads as 

(y P ,y) = (y P! y P + yr> = l|y P ll 2 - 

Thus, we have 



Ma ^ T^8 



\i\+\J\ 



llyll 



-max(|/|,|J|) 

By the triangular inequality, 

1 1 y r- 1 1 2 = lly-y P ll 2 > lly|l 2 - lly P ll 2 

and therefore, 

6 \i\+\J\ 



1-8 



max(|/|,|J|) 



Finally, observing that 

Italic + Italia = lly|| 2 

and ||y p || 2 > 0, we show that 



1 - 



|J|+|J1 



1-* 



max(|7|,| J|) 



|y|| 2 < Italia <lly|la- 



C. Proof of Theorem \5\ 

The first step consists in proving Inequality ( fTTT i. which 
reads as 

||*^y|| 2 >(l-MW 2 . 

By assumption, \T\ < K, so that 

||^y|| 2 = ||*J* T x|| 2 f 1 (l-^)||x|| 2 . 
According to the definition of T°, 



l*Toy|l 2 = max J2\(vi,y) 



iei 



> \\&* T y\\ 2 > (l-fo)||x|| 2 . 

The second step is to partition the estimate of the support 
set T° into two subsets: the set T° f] T, containing the indices 
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in the correct support set, and T° — T, the set of incorrectly 
selected indices. Then 



l*Toy|| 2 < 



< 



' $2K ||x|| 



(22) 



where the last inequality follows from the near-orthogonality 
property of Lemma Q] 
Furthermore, 



-+- <&yo p| T^T— T oX T — T° 



(23) 



< (l + S K ) ||x T o nT || 2 + 6 2K ||x|| 2 . 

and (l23l, one can show 



Combining the two inequalities 
that 



|*T°y|l 2 ^ (! + (5 A') ||x T o nT | 



2* 



2K ||X|| 2 



By invoking Inequality ( fTTT i it follows that 

(l-«5jc)||x|| a < (l + ^)||x r op| T || 2 + 25 2 if ||x|| 2 
Hence, 



| x t° nr|| 2 - 



i 



25' 



IK 



1 



>K 



which can be further relaxed to 



4D 1 - U 



|x T o nT || 2 > — 



2 A' 



02K 



To complete the proof, we observe that 

||x T _ T o|| 2 = 



n 2 
x L - 



< 



< 



(1 + s 2K ) 2 - (1 - 3S 2K y 



1 + 5: 



2K 



2K 



"8^ 



1 



'>2K 



D. Proof of Theorem \3\ 

In this section we show that the CM process allows for 
capturing a significant part of the residual signal power, that 
is, 



S T-T* || 2 



< Ci ||x>ji_y«-i I 



for some constant c\. Note that in each iteration, the CM 
operation is performed on the vector yfr 1 . The proof heavily 
relies on the inherent structure of yfr 1 . Specifically, in the 
following two-step roadmap of the proof, we first show how 
the measurement residue yfr 1 is related to the signal residue 
xr_p-i, and then employ this relationship to find the "energy 
captured" by the CM process. 

1) One can write y^T 1 as 



T y T f-lX r 



(24) 



for some x l r 1 e rI t U t * 1 | and x pT t-i S 
Furthermore, 

S 2 k 



2) It holds that 

1 1 ^T — T 1 | 



< 

12 - I 



>2K 



< 



26: 



3 A' 



\\X T _ T l-l\ 



\Xrp_ rpl-1 



(25) 



(i-s 3K y 

Proof: The claims can be established as below. 
1) It is clear that 

yf" 1 = resid (y,* r *-i) 



(a) 



(b) 

43 



resid ($ T _- r (-ix T _j'(-i, <&<j^-i) 
+ resid ^ T p^ T i-ix T f-^ T i~i , & T e-i) 

resid ($ t _ t hx t _ t h , + 

^Pt^T^- i Xj*_ 1 
- proj ($ T _fHXf_fH , <& T e~i) 



(c) 

— y?7 1 _7 1 ^-iXj'_j'£-i + $^-iXj, 7-f-i 



X r f_ r f£-l 

X p>T i-i 



where (a) holds because y = $ T _ T f-ix T _ T f-i + 
<& T {-} T i-ix T {-} T e-i and resid (•, is a 

linear function, (b) follows from the fact that 
3> T p T c-ix T p| T ?-i 6 span (3> T c-i), and (c) holds by 
defining 



i 1 * 



As a consequence of the RIP, 

ll X P,T f - 1 || 2 

. T i-lX T _ T (-l) 

1 



< 



^ II II ^ ^ 2A " II II 

" T-fe ll^-^Ha < ll*r-T«-i|| a ■ 

This proves the stated claim. 
2) For notational convenience, we first define 

which is the set of indices "captured" by the CM 
process. By the definition of we have 



|*T A yM a > ||*ry£ _1 | 



> *J 



} T _ Te _ 1 y r || 2 . 

(26) 

Removing the common columns between 4>t a and 
$ T _ Tf _! and noting that T\{~]T l_1 = tj>, we arrive 
at 



> $ 



J T 



(27) 
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An upper bound on the left hand side of dZTT i is given 
by 



|| -Ta-T*TUT«- iX ; 

41], 



< * 



T y T £-l UTA | ||X r || 2 



(28) 



<5 3K \\x r \\ 2 . 

A lower bound on the right hand side of d27| i can be 
derived as 



> 



^T—T 1 T—T e V r 



* T -f f *(TUT«- 1 )-(T-f«- 1 ) 



'(rUr'- 1 )-^-?'- 1 ) 



f(l-*jc) (^ _1 ) r _ri 



Substitute ([29) and (|28j into (|57). We get 



28* 



1-6 



03K ^ 2o" 3A - 

V . IX ^ 

I 114 1—0. 



A" 



3/f 



Note the explicit form of xf, 1 in d24l i. One has 



(29) 



(30) 



(31) 



and 



| 2 < ||x T _ T «-i \\ 2 + ||x p T €-i || 2 



< 1 



1 - 82K 



4H 

< 



1 



Xy_y£— 1 1 



1 - 

From OH) and d32l i. it is clear that 

2o~3A- 



(32) 



I ^T—T l 1 1 2 



(1 - s 3K y 

which completes the proof. 



!|x T _ T f-i|| 2 



E. Proof of Theorem @ 

As outlined in Section IIV-B1 let 

x p = 

be the projection coefficient vector, and let 

be the smear vector. We shall show that the smear mag- 
nitude ||e|| 2 is small, and then from this fact deduce that 
||x T _ T «|| 2 < c x T _j.« for some positive constant c. We 
proceed with establishing the validity of the following three 
claims. 

1) It can be shown that 

S3K 11 11 



< 



1 



?3K 



^T-T' || 2 - 



2) Let AT := f e - T l . One has 

||x T nAT|| 2 < 2||e|| 2 . 

This result implies that the energy concentrated in the 
erroneously removed signal components is small. 

3) Finally, 



1 



2 - 1 - s 3K 



T-T« II 2 ' 



Proof: The proofs can be summarized as follows. 
1) To prove the first claim, note that 



X p = *t f y = *t € * T x T 

— *ft*TnT' x rnr' ' 



'TfIT' 





(33) 



where the last equality follows from the definition of 
3>t Recall the definition of e, based on which we have 



€ 1 X^ X, 



12 — W"V 

(133 



T f II 2 



_1 *f t (* 



4TJ (5; 



T— || 2 ' 



2 

(34) 



1 - oW 

2) Consider an arbitrary index set T" C T^ of cardinality 
if that is disjoint from T, 

T'P|T = 

Such a set T' exists because \ f* -T > K. Since 



(35) 



we have 



(x p ) T , = (xfi) T , + e T < = + e T > 



|(x p ) T ,|| 2 < ||e|| 2 . 



On the other hand, by Step 4) of the subspace algorithm, 
AT is chosen to contain the K smallest projection 
coefficients (in magnitude). It therefore holds that 



|(x p ) AT || 2 < ||(x p ) T ,|| < ||e|| 2 



(36) 



Next, we decompose the vector (x p ) AT into a signal 
part and a smear part. Then 



|(x P ) 



■P/ATII2 



|xat + cat|| 2 
> H x at|| 2 - l|eAr|| 2 1 



l e AT|| 2 

lell 



(37) 



which is equivalent to 

||xat|| 2 < ||(x P ) AT || 2 

— II ( X p)at!I2 1 II' II--: 

Combining < f36l > and (|37l i and noting that x^t = 
x Tf|AT (x is supported on T, i.e., x^c = 0), we have 

||xrnAr|| 2 <2||e|| 2 . (38) 
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This completes the proof of the claimed result. 

3) This claim is proved by combining ([34l l and d38l l. Since 

- * 

, one has 



X^ rpg 



-""THAT' T—T e 



|x T _ T f|| 2 < ||x T p| AT || 2 
(1381) 



< 2 £ 



< 



25; 



i-s 3K 

l + S 3K I, 



<-T-f l || 2 
+ 1 



T-T e || 2 



1 - 5zK 
This proves Theorem [4] 



T— || 2 - 



and therefore 



2) Let 



i+s-l| > ^7 || x {»,---,if}||2 ' 



Sj log 3 — log 2 + 1 



log c K 



(42) 



(43) 



where |_-J denotes the floor function. Then, for any 1 < 
jo < J> a fter 

jo 

7=1 

iterations, the SP algorithm has the property that 



F. Proof of Theorem |S] 

Without loss of generality, assume that 

1*1 1 > N > ••• > > o. 

The following iterative algorithm is employed to create a 
partition of the support set T that will establish the correctness 
of the claimed result. 

Algorithm 2 Partitioning of the support set T 
Initialization: 

. Let T x = {1}, i = 1 and j = 1. 
Iteration Steps: 

• If i — K, quit the iterations; otherwise, continue. 
. If 



1 



\xA < x 



{i+l,- ,K}\\ 2 ■ 



(39) 



set Tj = Tj \J{i + 1}; otherwise, it must hold that 

- la* | > Hx^+i,...^}!^ j ( 40 ) 

and we therefore set j = j ' + 1 and Tj = {i + 1}. 
Increment the index i, i = i + 1. Continue with a new 
iteration. 



Suppose that after the iterative partition, we have 

t = t 1 \Jt 2 \J — \Jt j , 

where J < K is the number of the subsets in the partition. 
Let Sj = \Tj\, j = 1, • • • , J. It is clear that 

J 

i=i 

Then Theorem [8] is proved by invoking the following lemma. 

Lemma 6: 

1) For a given index j, let \Tj\ — s, and let 
Tj = {i, i + 1, • • • , i + s — 1} . 

Then, 

|aft+a-i-k| < 3 fc |aft+a-i| , for all < k < s - 1, 

(41) 



j=i 



More specifically, after 



1.5 K 

n= > n, < — 

-logCif 



(44) 



(45) 



iterations, the SP algorithm guarantees that T C T n . 

Proof: Both parts of this lemma are proved by mathemat- 
ical induction as follows. 

1) By the construction of Tj, 

l. ,113 

-|a*+ s _i| > ||x { i +Si ... !if }|| 2 . 
On the other hand, 

(El 



(46) 



-F i+S _ 2 | < ||x {i+s _ 1: ... iX} || 2 



< Hx^+a,...^}^ + |Xi+a-x| 



It follows that 



\xi+s-2\ < 3 |x<+ a -l| , 



or equivalently, the desired inequality (HTl) holds for = 
1. To use mathematical induction, suppose that for an 
index 1 < k < s — 1, 

Ni + a-i-^| < 3 £ for aU 1 < £ < k - 1. (47) 

Then, 

1, ,133., 

2 \ x i+s-l-k\ < < \\X{i+s-k,- ,K}\\ 

< |afi+«-fc| H h |afi+«-i| 

+ || x {i+s,--- ,K} || 2 



< (^3 fe - 1 + --- + l + -) |a; 4+s -i| 
3 fc . 

< y R+s-il ■ 
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This proves Equation ( |4TT i of the lemma. Inequality (l42l 
then follows from the observation that 



|x{v--„ff}|| 2 < \ X i\ + 



< ( 3 s - 1 



Ft+a-l| + ||X{i +Sj ... ,Jf}|| 2 



2) From d43l i. it is clear that for 1 < j < J, 

2 

Cj < . 

3^ 

According to Theorem [2] after ni iterations, 

2 

||XT-T«i|| 2 < l|x|| 2 . 

On the other hand, for any i G Ti, it follows from the 
first part of this lemma that 



3 s i 



Therefore, 



Ti C T™ 1 . 



Now, suppose that for a given jo < <A after 
rij iterations, we have 

30-1 

U T i cT '' 

3=1 



Let T =Uf =1 1 T J - Then 



< ll x T-T | 



< 



log 



„ K(log3 + l-log2) if -1.5 



- log - log 

This completes the proof of the last claim (1451 , 



G. Proof of Lemma \3\ 

The claim in the lemma is established through the following 
chain of inequalities: 



|x-x|| 2 < 



< 
< 



xf - *t (* T x T + e) 



Xf - *t (*TX T ) 



Xf 3?^ ( 3> T pi f x T pi ^ 



+ 




2 


T 



V T-T || 2 

ll X T~f || 2 
X 



T-TII2 



^f ^^_f X^_ rp 



(a) 
< 



< 



#2K 



1-5 



A" 



V T-T II 2 
1 )|| iX/ji — r f 1 1 1 



1 



l-^2K " T ~ Tl12 " ' 

where (a) is a consequence of the fact that 



2 ■ 



0. 



Denote the smallest coordinate in Tj a by i, and the 
largest coordinate in Tj a by fc. Then 

W>^||x {i ,..,K}|| 2 = ^||xT-T || 2 . 

After rij more iterations, i.e., after a total number of 
iterations equal to £ = t\ + n J0 , we obtain 

2 1 1 2 

ll X T-T' ll 2 < 3^- ||x T -T«i || 2 < 3I- I|XT-T ||2 < W\ ■ 

As a result, we conclude that 

T j0 C T £ 

is valid after ^ = J2~jLi n j iterations, which proves 
inequality (04). Now let the subspace algorithm run for 
n = n j iterations. Then, T C T n . Finally, note 

that 

. >A Si log 3 - log 2 + 1 
n = > n ? - < > 

Xlog3 + J(l -log 2) 



By relaxing the upper bound in terms of replacing 62K by 
5$k, we obtain 

11 -11 ^ 1 II II , 1 + ^ 3K 11 11 
l|x-x|| 2 < T -^-||x T _ f || 2 + T -^-||e|| 2 . 

This completes the proof of the lemma. 

H. Proof of Inequality rf75T ) 

The proof is similar to the proof given in Appendix ITJ1 We 
start with observing that 



; 1 = rcsid (y, & T e-i) 



$r(jT'- lX r + resid (e, & T e-i) 



(48) 



and 



||resid(e J * r *-i)|| 2 <||e|| a . (49) 
Again, let T A = f e - T l ~ x . Then by the definition of T A , 

* „l 

'r^TIJT*- 1 ^ || 2 

<frf resid (e, <& r «-i)|| 2 



\^r.y'r\> ||*tyM 2 



(El 



- y/l + 6 K \\e\\ 2 . 
The left hand side of ( f50b is upper bounded by 

ll^yf^L^II^T^TUT^xf- 1 ^ 

+ ||* Ta resid (e, $ T f-i)|| 2 



(50) 



< ||*t a *tut*-ix^- 1 || 2 + VT+^||e|| 2 . 

(51) 
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Combine ([50} and (ED- Then 



||*t a *tut'-ix^ 1 || 2 + 2VTT^||e|| 



(52) 



Comparing the above inequality d52l with its analogue for the 
noiseless case, d26l ). one can see that the only difference is the 
2\/\ + 8k ||e|| 2 term on the left hand side of (1521 1, Following 
the same steps as used in the derivations leading from d26l i to 
\, one can show that 



2(5 3 a p* l \\ 2 + 2y/\ + 8 K ||e|| 2 > (1 - 5 K ) ||x T _^|| 2 ■ 
Applying d32l . we get 



I ^T—T^ 1 1 2 ^ 



X T— T^ _1 1 



i 



OK 



(1 - ^3a) 

which proves the inequality ( [T5i >. 

/. Proof of Inequality ( 1761 ) 

This proof is similar to that of Theorem |4] When there are 
measurement perturbations, one has 

x p = *t f y = (* T x T + e) . 

Then the smear energy is upper bounded by 



N 2 < 



< 



*t e 



v 7 ! 



32A' 



where the last inequality holds because the largest eigenvalue 



of «1>L. satisfies 



< 



?2A 



Invoking the same technique as used for deriving d34l i. we 
have 



< 



->3K 



1 



1 



T — 



(53) 
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It is straightforward to verify that ((38) still holds, which now 
reads as 



||x T nAT|| 2 < 2||e|| 2 . 
Combining ( f53l and d54l i. one has 



|x T _ T £|| 2 < ||x T p AT || 2 + ||x T _j, 4 || 2 

. 1 + S3K || „ . "2 

< I Xr, 



1 - foli 

which proves the claimed result 



T — 



2 VT=5: 



(54) 



2A" 
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